Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Sparse reward exploration mechanism fusing curiosity and policy distillation
Ziteng WANG, Yaxin YU, Zifang XIA, Jiaqi QIAO
Journal of Computer Applications    2023, 43 (7): 2082-2090.   DOI: 10.11772/j.issn.1001-9081.2022071116
Abstract168)   HTML6)    PDF (1696KB)(242)       Save

Deep reinforcement learning algorithms are difficult to learn optimal policy through interaction with environment in reward sparsity environments, so that the intrinsic reward needs to be built to guide the update of algorithms. However, there are still some problems in this way: 1) statistical inaccuracy of state classification will misjudge reward value, thereby causing the agent to learn wrong behavior; 2) due to the strong ability of the prediction network to identify state information, the state freshness generated by the intrinsic reward decreases, which affects the learning effect of the optimal policy; 3) due to the random state transition, the information of the teacher strategies is not effectively utilized, which reduces the agent’s ability to explore the environment. To solve the above problems, a reward construction mechanism combining prediction error of stochastic generative network with hash discretization statistics, namely RGNP-HCE (Randomly Generated Network Prediction and Hash Count Exploration), was proposed, and the knowledge of multi-teacher policy was transferred to student policy through distillation. In RGNP-HCE mechanism, the fusion reward was constructed through the idea of curiosity classification. In specific, the global curiosity reward was constructed by stochastic generative network’s prediction error between multiple episodes, and the local curiosity reward was constructed by hash discretization statistics in one episode, which guaranteed the rationality of intrinsic rewards and the correctness of policy gradient updates. In addition, multi-teacher policy distillation provides students with multiple reference directions for exploration, which improved environmental exploration ability of the student policy effectively. Finally, in the test environments of Montezuma’s Revenge and Breakout, experiment of comparing the proposed mechanism with four current mainstream deep reinforcement learning algorithms was carried out, and policy distillation was performed. The results show that compared with the average performance of current high-performance deep reinforcement learning algorithms, the average performance of RGNP-HCE mechanism in both test environments is improved, and the distilled student policy is further improved in average performance, indicating that RGNP-HCE mechanism and policy distillation are effective in improving the exploration ability of agent.

Table and Figures | Reference | Related Articles | Metrics
Explainable recommendation mechanism by fusion collaborative knowledge graph and counterfactual inference
Zifang XIA, Yaxin YU, Ziteng WANG, Jiaqi QIAO
Journal of Computer Applications    2023, 43 (7): 2001-2009.   DOI: 10.11772/j.issn.1001-9081.2022071113
Abstract231)   HTML11)    PDF (1898KB)(367)       Save

In order to construct a transparent and trustworthy recommendation mechanism, relevant research works mainly provide reasonable explanations for personalized recommendation through explainable recommendation mechanisms. However, there are three major limitations of the existing explainable recommendation mechanism: 1) using correlations only can provide rational explanations rather than causal explanations, and using paths to provide explanations will bring privacy leakage; 2) the problem of sparse user feedback is ignored, so it is difficult to guarantee the fidelity of explanations; 3) the granularities of explanations are relatively coarse, and users’ personalized preferences are not considered. To solve the above problems, an explainable recommendation mechanism ERCKCI based on Collaborative Knowledge Graph (CKG) and counterfactual inference was proposed. Firstly, based on the user’s own behavior sequence, the counterfactual inference was used to achieve high-sparsity causal decorrelation by using the casual relations, and the counterfactual explanations were derived iteratively. Secondly, in order to improve the fidelity of explanations, not only the CKG and the neighborhood propagation mechanism of the Graph Neural Network (GNN) were used to learn users’ and items’ representations based on single time slice; but also the user long-short term preference were captured to enhance user preference representation through self-attention mechanism on multiple time slices. Finally, via a higher-order connected subgraph of the counterfactual set, the multi-granularity personalized preferences of user was captured to enhance counterfactual explanations. To verify the effectiveness of ERCKCI mechanism, comparison experiments were performed on the public datasets MovieLens(100k), Book-crossing and MovieLens(1M). The obtained results show that compared with the Explainable recommendation based on Counterfactual Inference (ECI) algorithm under the Relational Collaborative Filtering (RCF) recommendation model on the first two datasets, the proposed mechanism has the explanation fidelity improved by 4.89 and 3.38 percentage points respectively, the size of CF set reduced by 63.26% and 66.24% respectively, and the sparsity index improved by 1.10 and 1.66 percentage points respectively; so the explainability is improved effectively by the proposed mechanism.

Table and Figures | Reference | Related Articles | Metrics